feat(pt_expt): add dp finetune support#5331
feat(pt_expt): add dp finetune support#5331wanghan-iapcm merged 7 commits intodeepmodeling:masterfrom
Conversation
Add `--finetune`, `--model-branch`, and `--use-pretrain-script` support to `dp --pt-expt train`. The implementation mirrors the pt backend's finetune flow: load pretrained checkpoint, optionally change type map, selectively copy weights (descriptor always from pretrained, fitting conditionally), and adjust output bias. Also fix a bug in dpmodel's base_atomic_model.change_type_map where out_bias/out_std were not extended before remapping when the new type map introduces unseen types, causing an IndexError with negative remap indices.
Extend the finetune flow to accept .pte frozen models as the pretrained source, in addition to .pt checkpoints. The .pte file is loaded via serialize_from_file + BaseModel.deserialize to reconstruct the pretrained model with weights. Embed model_params in the .pte archive during freeze so that --use-pretrain-script works with .pte sources. Older .pte files without embedded model_params fall back to a minimal dict with just type_map. Add weight consistency checks to CLI tests (lr=1e-30 to prevent training from modifying weights) verifying descriptor and fitting weights match the pretrained model after finetune initialization.
The DPA1 test_finetune_change_type bias-adjusted comparison failed because the two trainers (with different type maps) sampled different data frames for bias adjustment. The data set has 80 frames but data_stat_nbatch=1 sampled only 1 frame, and the frame selection depended on numpy RNG state which differed between the two trainers. Fix by subsampling the data to 2 frames in TestEnergyModelDPA1 and using batch_size=2 so all frames are consumed deterministically.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c1be2ec5ef
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds pt_expt fine-tuning flow: expands type-map remap to grow per-type out_bias/out_std for new atom types, embeds/extracts model_params in frozen artifacts, introduces finetune rule extraction and Trainer logic for selective pretrained weight transfer and bias adjustment, and adds unit and e2e tests. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as CLI (entrypoint)
participant Config as Config
participant Finetune as FinetuneRules
participant Serializer as Serializer
participant Trainer as Trainer
participant Model as Model
User->>CLI: dp --pt-expt train --finetune model.pte --use-pretrain-script
CLI->>Config: load config & init model
CLI->>Finetune: get_finetune_rules(model.pte, model_config, model_branch)
Finetune->>Serializer: serialize_from_file(.pte/.pt)
Serializer-->>Finetune: model data + model_params
Finetune-->>CLI: finetune_links
CLI->>Trainer: Trainer(finetune_model, finetune_links)
Trainer->>Serializer: deserialize pretrained (.pte/.pt)
Trainer->>Trainer: determine resume vs finetune rules
Trainer->>Model: selective weight transfer (descriptor / fitting / _extra_state)
Trainer->>Model: check finetune_links.Default.get_has_new_type()
alt new types present
Trainer->>Model: change_type_map(new_type_map) -> expand out_bias/out_std + remap
end
Trainer->>Model: model_change_out_bias(sample_func, mode)
Trainer->>Model: start finetune training
Trainer->>Serializer: deserialize_to_file(output.pte, data, model_params)
Serializer-->>User: saved checkpoint (.pte) with embedded model_params
sequenceDiagram
participant Trainer
participant Pretrained as PretrainedCheckpoint
participant Target as TargetModel
participant Rule as FinetuneRule
Trainer->>Pretrained: load weights (.pte/.pt)
Trainer->>Rule: get_has_new_type()
alt New Types Detected
Trainer->>Target: change_type_map(new_type_map)
Target->>Target: expand out_bias/out_std then remap
end
Trainer->>Target: copy descriptor weights from pretrained
Trainer->>Rule: get_random_fitting()
alt Keep Random Fitting
Note over Target: keep random init for fitting params
else Use Pretrained Fitting
Trainer->>Target: copy fitting weights from pretrained
end
Trainer->>Target: change_out_bias(mode)
Target->>Target: adjust bias via statistics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepmd/pt_expt/train/training.py`:
- Around line 384-385: The current logic sets resume_model = init_model or
restart_model or finetune_model so finetune can incorrectly pick init/restart
checkpoints; change this by (a) validating inputs up front in the function that
defines init_model/restart_model/finetune_model and raise an error if more than
one of those is set, OR (b) keep the existing resume_model variable but in the
finetune branch explicitly load weights from finetune_model (not resume_model)
and use that checkpoint to populate descriptor/fitting weights and
_extra_state["model_params"]; update both the initial resume/resuming block
(resume_model/resuming) and the finetune-specific code region (~lines 487-527)
to follow the chosen approach so finetune never inherits init/restart weights.
- Around line 991-1002: The log attempts to convert CUDA tensors returned by
_model.get_out_bias() to numpy via np.asarray which raises RuntimeError on CUDA;
replace the np.asarray(...) calls with to_numpy_array(...) from
deepmd.dpmodel.common when building the log message after calling
_model.change_out_bias (and similarly anywhere else you call np.asarray on
_model.get_out_bias()), so call to_numpy_array(old_bias).reshape(-1) and
to_numpy_array(new_bias).reshape(-1) (slicing by len(model_type_map) as before)
to ensure device-safe conversion for logging.
In `@deepmd/pt_expt/utils/finetune.py`:
- Around line 35-40: The code currently falls back to returning only
{"type_map": ...} when serialize_from_file(finetune_model) lacks "model_params",
which silently allows change_model_params=True to proceed with incomplete
config; modify the logic in the finetune model-loading blocks (where
serialize_from_file, finetune_model is used and again in the block around lines
79-92) to detect when change_model_params is True and "model_params" is missing,
and immediately raise a clear error (including mention of using
--use-pretrain-script or that legacy .pte lacks model_params.json) instead of
returning the minimal dict; ensure the error path prevents calling
get_finetune_rule_single with incomplete input.
In `@source/tests/consistent/model/test_ener.py`:
- Around line 1333-1423: The test wrongly sets dp_std_orig =
to_numpy_array(dp_model.get_out_bias()) instead of snapshotting the original
out_std and then never asserts remapping for old types; change the snapshot to
dp_std_orig = to_numpy_array(dp_model.atomic_model.out_std) (and similarly
ensure any other std snapshots use atomic_model.out_std), seed a non-trivial
out_std on dp_model before change_type_map, then add assertions that the
remapped old entries land at indices 3 and 0 (compare dp_std_new[:, 3, :] to
dp_std_orig[:, 0, :] for "O" and dp_std_new[:, 0, :] to dp_std_orig[:, 1, :] for
"H"), keep cross-backend equality checks (pt_model, pt_expt_model) and remove or
use any now-unused locals; run ruff check . and ruff format . before committing.
In `@source/tests/pt_expt/test_finetune.py`:
- Around line 564-580: The loop comparing ft_state and pre_state must, when
random_fitting is True, assert that fitting tensors are not all identical:
locate the loop over ft_state and the variables ft_state, pre_state and
random_fitting; gather keys containing ".fitting" present in both ft_state and
pre_state and assert that at least one of those tensors differs (e.g., by
checking torch.any(ft_state[k] != pre_state[k]) for at least one k), failing the
test if all fitting tensors are equal.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: bb57259a-acc4-47d2-bb02-4dc75c3365c5
📒 Files selected for processing (7)
deepmd/dpmodel/atomic_model/base_atomic_model.pydeepmd/pt_expt/entrypoints/main.pydeepmd/pt_expt/train/training.pydeepmd/pt_expt/utils/finetune.pydeepmd/pt_expt/utils/serialization.pysource/tests/consistent/model/test_ener.pysource/tests/pt_expt/test_finetune.py
Older .pte files (or those produced by external code calling deserialize_to_file without model_params) lack the embedded model_params.json. When --use-pretrain-script is used with such files, get_finetune_rule_single would crash with a KeyError on "descriptor". Add an explicit check with a clear error message.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5331 +/- ##
==========================================
+ Coverage 82.40% 82.42% +0.02%
==========================================
Files 783 784 +1
Lines 79031 79124 +93
Branches 3675 3675
==========================================
+ Hits 65122 65219 +97
+ Misses 12736 12731 -5
- Partials 1173 1174 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Reject combining finetune_model with init_model/restart_model - Use to_numpy_array instead of np.asarray in model_change_out_bias for CUDA tensor safety - Remove unused variables dp_std_orig/dp_std_before in test_ener.py - Add out_std remap correctness assertion for old types - Assert fitting weights differ (not just skip) for random_fitting=True, excluding bias_atom_e which is set by bias adjustment
There was a problem hiding this comment.
🧹 Nitpick comments (1)
source/tests/pt_expt/test_finetune.py (1)
122-132: Minor: Redundant import.
shutilis already imported at line 16; the localimport shutil as _shutilon line 124 is unnecessary.Suggested fix
def _subsample_data(src_dir: str, dst_dir: str, nframes: int = 2) -> None: """Copy a data system, keeping only the first *nframes* frames.""" - import shutil as _shutil - - _shutil.copytree(src_dir, dst_dir, dirs_exist_ok=True) + shutil.copytree(src_dir, dst_dir, dirs_exist_ok=True) set_dir = os.path.join(dst_dir, "set.000")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/pt_expt/test_finetune.py` around lines 122 - 132, The helper _subsample_data currently does a redundant local import "import shutil as _shutil"; remove that line and use the module already imported at the top-level (shutil) when calling copytree in _subsample_data so the function uses shutil.copytree(dst_dir, ...) instead of the locally imported _shutil; update references in _subsample_data to shutil to avoid the duplicate import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@source/tests/pt_expt/test_finetune.py`:
- Around line 122-132: The helper _subsample_data currently does a redundant
local import "import shutil as _shutil"; remove that line and use the module
already imported at the top-level (shutil) when calling copytree in
_subsample_data so the function uses shutil.copytree(dst_dir, ...) instead of
the locally imported _shutil; update references in _subsample_data to shutil to
avoid the duplicate import.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 9816e5c4-2f84-4307-b1ec-86b1aa95e00b
📒 Files selected for processing (3)
deepmd/pt_expt/train/training.pysource/tests/consistent/model/test_ener.pysource/tests/pt_expt/test_finetune.py
✅ Files skipped from review due to trivial changes (1)
- source/tests/consistent/model/test_ener.py
…in finetune tests Replace np.asarray() with to_numpy_array() when converting model bias tensors to numpy arrays. np.asarray() fails on CUDA tensors with "can't convert cuda:0 device type tensor to numpy", while to_numpy_array() handles device transfer automatically.
# Conflicts: # deepmd/pt_expt/utils/serialization.py
Summary
--finetune,--model-branch, and--use-pretrain-scriptsupport todp --pt-expt train, mirroring the pt backend's finetune flow (load pretrained checkpoint, change type map, selective weight copy, output bias adjustment).ptcheckpoints and frozen.ptemodels (embedmodel_paramsin.pteduring freeze for--use-pretrain-script)base_atomic_model.change_type_mapwhereout_bias/out_stdwere not extended before remapping when the new type map introduces unseen types, causingIndexErrorwith negative remap indicesUsage examples
Files changed
deepmd/pt_expt/utils/finetune.pyget_finetune_rules()for pt_expt, supports.ptand.ptedeepmd/pt_expt/entrypoints/main.py--finetune/--model-branch/--use-pretrain-scriptthroughtrain()→get_trainer()→Trainer; passmodel_paramsto.pteduring freezedeepmd/pt_expt/train/training.pyTrainer.__init__(.ptand.pte);model_change_out_bias()deepmd/pt_expt/utils/serialization.pymodel_params.jsonin.ptearchivedeepmd/dpmodel/atomic_model/base_atomic_model.pychange_type_mapto extendout_bias/out_stdfor new types (array-api compatible)source/tests/pt_expt/test_finetune.py.ptefinetune,--use-pretrain-script,random_fitting, inherited weight consistencysource/tests/consistent/model/test_ener.pytest_change_type_map_new_typeverifyingout_bias/out_stdextension across dp, pt, pt_exptTest plan
python -m pytest source/tests/pt_expt/test_finetune.py -v(9 passed)python -m pytest source/tests/pt_expt/test_training.py -v(11 passed, no regression)python -m pytest source/tests/consistent/model/test_ener.py -k change_type_map -v(3 passed)python -m pytest source/tests/consistent/descriptor/test_se_e2_a.py -v(351 passed, no regression)Summary by CodeRabbit
New Features
Tests